<<<<<<< HEAD Microbiome data manipulation

Microbiome data manipulation

Processing phyloseq objects

Instructions to manipulate microbiome data sets using tools from the phyloseq package and some extensions from the microbiome package, including subsetting, aggregating and filtering.

Load example data:

library(phyloseq)
library(microbiome)
library(knitr)
# Load example data
data(atlas1006)   

# Let us rename the example data (which is a phyloseq object)
pseq <- atlas1006

Retrieving data elements from a phyloseq object

A phyloseq object contains OTU table (taxa abundances), sample metadata, taxonomy table (mapping between OTUs and higher-level taxonomic classifications), and phylogenetic tree (relations between the taxa). Some of these are optional.

Pick metadata as data.frame:

meta <- meta(pseq)

Taxonomy table:

taxonomy <- tax_table(pseq)

Abundances for taxonomic groups (‘OTU table’) as a TaxaxSamples matrix:

# Absolute abundances
otu.absolute <- abundances(pseq)

# Relative abundances
otu.relative <- abundances(pseq, "compositional")

Melting phyloseq data for easier plotting:

df <- psmelt(pseq)
kable(head(df))
OTU Sample Abundance age gender nationality DNA_extraction_method project diversity bmi_group subject time sample Phylum Genus
113110 Prevotella melaninogenica et rel. Sample-448 944002 54 female CentralEurope o 18 5.98 lean 448 0 Sample-448 Bacteroidetes Prevotella melaninogenica et rel.
113015 Prevotella melaninogenica et rel. Sample-360 902034 45 female CentralEurope o 13 5.49 severeobese 360 0 Sample-360 Bacteroidetes Prevotella melaninogenica et rel.
112747 Prevotella melaninogenica et rel. Sample-190 862870 34 female CentralEurope r 7 6.06 lean 190 0 Sample-190 Bacteroidetes Prevotella melaninogenica et rel.
113109 Prevotella melaninogenica et rel. Sample-743 852350 52 male US NA 19 5.21 obese 743 0 Sample-743 Bacteroidetes Prevotella melaninogenica et rel.
112944 Prevotella melaninogenica et rel. Sample-366 851147 52 female CentralEurope o 15 5.63 obese 366 0 Sample-366 Bacteroidetes Prevotella melaninogenica et rel.
113639 Prevotella melaninogenica et rel. Sample-375 844482 45 female CentralEurope o 16 5.64 severeobese 375 0 Sample-375 Bacteroidetes Prevotella melaninogenica et rel.

Sample operations

Sample names and variables

head(sample_names(pseq))
## [1] "Sample-1" "Sample-2" "Sample-3" "Sample-4" "Sample-5" "Sample-6"

Total OTU abundance in each sample

s <- sample_sums(pseq)

Abundance of a given species in each sample

head(abundances(pseq)["Akkermansia",])
## Sample-1 Sample-2 Sample-3 Sample-4 Sample-5 Sample-6 
##     1319     2299    29980     3824     2133      864

Select a subset by metadata fields:

pseq.subset <- subset_samples(pseq, nationality == "AFR")

Select a subset by providing sample names:

# Check sample names for African Females in this phyloseq object
s <- rownames(subset(meta(pseq), nationality == "AFR" & sex == "Female"))

# Pick the phyloseq subset with these sample names
pseq.subset2 <- prune_samples(s, pseq)

Pick samples at the baseline time points only:

pseq0 <- baseline(pseq)

Data transformations

The microbiome package provides a wrapper for standard sample/OTU transforms. For arbitrary transforms, use the transform_sample_counts function in the phyloseq package. Log10 transform is log(1+x) if the data contains zeroes. Also “Z”, “clr”, “hellinger”, and “shift” are available as common transformations. Relative abundances (note that the input data needs to be in absolute scale, not logarithmic!):

pseq.compositional <- microbiome::transform(pseq, "compositional")

Variable operations

Sample variable names

sample_variables(pseq)
##  [1] "age"                   "gender"               
##  [3] "nationality"           "DNA_extraction_method"
##  [5] "project"               "diversity"            
##  [7] "bmi_group"             "subject"              
##  [9] "time"                  "sample"

Pick values for a given variable

head(get_variable(pseq, sample_variables(pseq)[1]))
## [1] 28 24 52 22 25 42

Assign new fields to metadata

# Calculate diversity for samples
div <- global(pseq, index = "shannon")

# Assign the estimated diversity to sample metadata
sample_data(pseq)$diversity <- div

Taxa operations

Number of taxa

n <- ntaxa(pseq)

Most abundant taxa

topx <- top_taxa(pseq, n = 10)

Names

ranks <- rank_names(pseq)  # Taxonomic levels
taxa  <- taxa(pseq)        # Taxa names at the analysed level

Subset taxa

pseq.bac <- subset_taxa(pseq, Phylum == "Bacteroidetes")

Prune (select) taxa:

# List of Genera in the Bacteroideted Phylum
taxa <- map_levels(NULL, "Phylum", "Genus", pseq)$Bacteroidetes

# With given taxon names
ex2 <- prune_taxa(taxa, pseq)

# Taxa with positive sum across samples
ex3 <- prune_taxa(taxa_sums(pseq) > 0, pseq)

Filter by user-specified function values (here variance):

f <- filter_taxa(pseq, function(x) var(x) > 1e-05, TRUE)

List unique phylum-level groups:

head(get_taxa_unique(pseq, "Phylum"))
## [1] "Actinobacteria"         "Bacilli"               
## [3] "Proteobacteria"         "Verrucomicrobia"       
## [5] "Bacteroidetes"          "Clostridium cluster XV"

Pick the taxa abundances for a given sample:

samplename <- sample_names(pseq)[[1]]

# Pick abundances for a particular taxon
tax.abundances <- abundances(pseq)[, samplename]

Merging operations

Aggregate taxa to higher taxonomic levels. This is particularly useful if the phylogenetic tree is missing. When it is available, see merge_samples, merge_taxa and tax_glom).

Put the less abundant taxa together in the “Other” category:

pseq2 <- aggregate_taxa(pseq, "Phylum", top = 5) 

Merge the desired taxa into “Other” category. Here, we merge all Bacteroides groups into a single group named Bacteroides.

pseq2 <- merge_taxa2(pseq, pattern = "^Bacteroides", name = "Bacteroides") 

Merging phyloseq objects with the phyloseq package:

merge_phyloseq(pseqA, pseqB)

Rarification

pseq.rarified <- rarefy_even_depth(pseq)

Taxonomy

Convert between taxonomic levels (here from Genus (Akkermansia) to Phylum (Verrucomicrobia):

m <- map_levels("Akkermansia", "Genus", "Phylum", tax_table(pseq))
print(m)
## [1] "Verrucomicrobia"

Metadata

Visualize frequencies of given factor (sex) levels within the indicated groups (group):

res <- plot_frequencies(sample_data(pseq), "bmi_group", "gender")
print(res$plot)

# Retrieving the actual data values:
kable(head(res$data), digits = 2)
Groups Factor n pct
underweight female 21 91.30
underweight male 2 8.70
lean female 304 61.66
lean male 189 38.34
overweight female 102 50.00
overweight male 102 50.00

Custom functions are provided to cut age or BMI information into discrete classes.

group_bmi(c(22, 28, 31), "standard")
## [1] lean       overweight obese     
## Levels: underweight lean overweight obese severe morbid super
group_age(c(17, 41, 102), "decades")
## [1] [10,20)   [40,50)   [100,110]
## 10 Levels: [10,20) [20,30) [30,40) [40,50) [50,60) [60,70) ... [100,110]

Add metadata to a phyloseq object. For reproducibility, we just use the existing metadata in this example, but this can be replaced by another data.frame (samples x fields).

# Example data
data(dietswap)
pseq <- dietswap

# Pick the existing metadata from a phyloseq object
# (or retrieve this from another source)
df <- meta(pseq)

# Merge the metadata back in the phyloseq object
pseq2 <- merge_phyloseq(pseq, sample_data(df))

Leo Lahti, Sudarshan Shetty et al. 2018-05-20

||||||| merged common ancestors Microbiome data manipulation

Microbiome data manipulation

Processing phyloseq objects

Instructions to manipulate microbiome data sets using tools from the phyloseq package and some extensions from the microbiome package, including subsetting, aggregating and filtering.

Load example data:

library(phyloseq)
library(microbiome)
library(knitr)
# Load example data
data(atlas1006)   

# Let us rename the example data (which is a phyloseq object)
pseq <- atlas1006

Retrieving data elements from a phyloseq object

A phyloseq object contains OTU table (taxa abundances), sample metadata, taxonomy table (mapping between OTUs and higher-level taxonomic classifications), and phylogenetic tree (relations between the taxa). Some of these are optional.

Pick metadata as data.frame:

meta <- meta(pseq)

Taxonomy table:

taxonomy <- tax_table(pseq)

Abundances for taxonomic groups (‘OTU table’) as a TaxaxSamples matrix:

# Absolute abundances
otu.absolute <- abundances(pseq)

# Relative abundances
otu.relative <- abundances(pseq, "compositional")

Melting phyloseq data for easier plotting:

df <- psmelt(pseq)
kable(head(df))
OTU Sample Abundance age gender nationality DNA_extraction_method project diversity bmi_group subject time sample Phylum Genus
113110 Prevotella melaninogenica et rel. Sample-448 944002 54 female CentralEurope o 18 5.98 lean 448 0 Sample-448 Bacteroidetes Prevotella melaninogenica et rel.
113015 Prevotella melaninogenica et rel. Sample-360 902034 45 female CentralEurope o 13 5.49 severeobese 360 0 Sample-360 Bacteroidetes Prevotella melaninogenica et rel.
112747 Prevotella melaninogenica et rel. Sample-190 862870 34 female CentralEurope r 7 6.06 lean 190 0 Sample-190 Bacteroidetes Prevotella melaninogenica et rel.
113109 Prevotella melaninogenica et rel. Sample-743 852350 52 male US NA 19 5.21 obese 743 0 Sample-743 Bacteroidetes Prevotella melaninogenica et rel.
112944 Prevotella melaninogenica et rel. Sample-366 851147 52 female CentralEurope o 15 5.63 obese 366 0 Sample-366 Bacteroidetes Prevotella melaninogenica et rel.
113639 Prevotella melaninogenica et rel. Sample-375 844482 45 female CentralEurope o 16 5.64 severeobese 375 0 Sample-375 Bacteroidetes Prevotella melaninogenica et rel.

Sample operations

Sample names and variables

head(sample_names(pseq))
## [1] "Sample-1" "Sample-2" "Sample-3" "Sample-4" "Sample-5" "Sample-6"

Total OTU abundance in each sample

s <- sample_sums(pseq)

Abundance of a given species in each sample

head(abundances(pseq)["Akkermansia",])
## Sample-1 Sample-2 Sample-3 Sample-4 Sample-5 Sample-6 
##     1319     2299    29980     3824     2133      864

Select a subset by metadata fields:

pseq.subset <- subset_samples(pseq, nationality == "AFR")

Select a subset by providing sample names:

# Check sample names for African Females in this phyloseq object
s <- rownames(subset(meta(pseq), nationality == "AFR" & sex == "Female"))

# Pick the phyloseq subset with these sample names
pseq.subset2 <- prune_samples(s, pseq)

Pick samples at the baseline time points only:

pseq0 <- baseline(pseq)

Data transformations

The microbiome package provides a wrapper for standard sample/OTU transforms. For arbitrary transforms, use the transform_sample_counts function in the phyloseq package. Log10 transform is log(1+x) if the data contains zeroes. Also “Z”, “clr”, “hellinger”, and “shift” are available as common transformations. Relative abundances (note that the input data needs to be in absolute scale, not logarithmic!):

pseq.compositional <- microbiome::transform(pseq, "compositional")

Variable operations

Sample variable names

sample_variables(pseq)
##  [1] "age"                   "gender"               
##  [3] "nationality"           "DNA_extraction_method"
##  [5] "project"               "diversity"            
##  [7] "bmi_group"             "subject"              
##  [9] "time"                  "sample"

Pick values for a given variable

head(get_variable(pseq, sample_variables(pseq)[1]))
## [1] 28 24 52 22 25 42

Assign new fields to metadata

# Calculate diversity for samples
div <- global(pseq, index = "shannon")

# Assign the estimated diversity to sample metadata
sample_data(pseq)$diversity <- div

Taxa operations

Number of taxa

n <- ntaxa(pseq)

Most abundant taxa

topx <- top_taxa(pseq, n = 10)

Names

ranks <- rank_names(pseq)  # Taxonomic levels
taxa  <- taxa(pseq)        # Taxa names at the analysed level

Subset taxa

pseq.bac <- subset_taxa(pseq, Phylum == "Bacteroidetes")

Prune (select) taxa:

# List of Genera in the Bacteroideted Phylum
taxa <- map_levels(NULL, "Phylum", "Genus", pseq)$Bacteroidetes

# With given taxon names
ex2 <- prune_taxa(taxa, pseq)

# Taxa with positive sum across samples
ex3 <- prune_taxa(taxa_sums(pseq) > 0, pseq)

Filter by user-specified function values (here variance):

f <- filter_taxa(pseq, function(x) var(x) > 1e-05, TRUE)

List unique phylum-level groups:

head(get_taxa_unique(pseq, "Phylum"))
## [1] "Actinobacteria"         "Bacilli"               
## [3] "Proteobacteria"         "Verrucomicrobia"       
## [5] "Bacteroidetes"          "Clostridium cluster XV"

Pick the taxa abundances for a given sample:

samplename <- sample_names(pseq)[[1]]

# Pick abundances for a particular taxon
tax.abundances <- abundances(pseq)[, samplename]

Merging operations

Aggregate taxa to higher taxonomic levels. This is particularly useful if the phylogenetic tree is missing. When it is available, see merge_samples, merge_taxa and tax_glom).

Put the less abundant taxa together in the “Other” category:

pseq2 <- aggregate_taxa(pseq, "Phylum", top = 5) 

Merge the desired taxa into “Other” category. Here, we merge all Bacteroides groups into a single group named Bacteroides.

pseq2 <- merge_taxa2(pseq, pattern = "^Bacteroides", name = "Bacteroides") 

Merging phyloseq objects with the phyloseq package:

merge_phyloseq(pseqA, pseqB)

Rarification

pseq.rarified <- rarefy_even_depth(pseq)

Taxonomy

Convert between taxonomic levels (here from Genus (Akkermansia) to Phylum (Verrucomicrobia):

m <- map_levels("Akkermansia", "Genus", "Phylum", tax_table(pseq))
print(m)
## [1] "Verrucomicrobia"

Metadata

Visualize frequencies of given factor (sex) levels within the indicated groups (group):

res <- plot_frequencies(sample_data(pseq), "bmi_group", "gender")
print(res$plot)

# Retrieving the actual data values:
kable(head(res$data), digits = 2)
Groups Factor n pct
underweight female 21 91.30
underweight male 2 8.70
lean female 304 61.66
lean male 189 38.34
overweight female 102 50.00
overweight male 102 50.00

Custom functions are provided to cut age or BMI information into discrete classes.

group_bmi(c(22, 28, 31), "standard")
## [1] lean       overweight obese     
## Levels: underweight lean overweight obese severe morbid super
group_age(c(17, 41, 102), "decades")
## [1] [10,20)   [40,50)   [100,110]
## 10 Levels: [10,20) [20,30) [30,40) [40,50) [50,60) [60,70) ... [100,110]

Add metadata to a phyloseq object. For reproducibility, we just use the existing metadata in this example, but this can be replaced by another data.frame (samples x fields).

# Example data
data(dietswap)
pseq <- dietswap

# Pick the existing metadata from a phyloseq object
# (or retrieve this from another source)
df <- meta(pseq)

# Merge the metadata back in the phyloseq object
pseq2 <- merge_phyloseq(pseq, sample_data(df))

Leo Lahti, Sudarshan Shetty et al. 2018-05-20

======= Microbiome data manipulation

Microbiome data manipulation

Processing phyloseq objects

Instructions to manipulate microbiome data sets using tools from the phyloseq package and some extensions from the microbiome package, including subsetting, aggregating and filtering.

Load example data:

library(phyloseq)
library(microbiome)
library(knitr)
# Load example data
data(atlas1006)   

# Let us rename the example data (which is a phyloseq object)
pseq <- atlas1006

Summarizing the contents of a phyloseq object

summarize_phyloseq(pseq)
## Compositional = NO
## 1] Min. number of reads = 119895 
## 2] Max. number of reads = 1822706 
## 3] Total number of reads = 868233528 
## 4] Average number of reads = 740813.590443686 
## 5] Median number of reads = 700826 
## 7] Sparsity = 0 
## 6] Any OTU sum to 1 or less? NO 
## 8] Number of singletons = 0 
## 9] Percent of OTUs that are singletons 0 
## 10] Number of sample variables are: 10 
## age 
## gender 
## nationality 
## DNA_extraction_method 
## project 
## diversity 
## bmi_group 
## subject 
## time 
## sample

Retrieving data elements from a phyloseq object

A phyloseq object contains OTU table (taxa abundances), sample metadata, taxonomy table (mapping between OTUs and higher-level taxonomic classifications), and phylogenetic tree (relations between the taxa). Some of these are optional.

Pick metadata as data.frame:

meta <- meta(pseq)

Taxonomy table:

taxonomy <- tax_table(pseq)

Abundances for taxonomic groups (‘OTU table’) as a TaxaxSamples matrix:

# Absolute abundances
otu.absolute <- abundances(pseq)

# Relative abundances
otu.relative <- abundances(pseq, "compositional")

Melting phyloseq data for easier plotting:

df <- psmelt(pseq)
kable(head(df))
OTU Sample Abundance age gender nationality DNA_extraction_method project diversity bmi_group subject time sample Phylum Genus
113110 Prevotella melaninogenica et rel. Sample-448 944002 54 female CentralEurope o 18 5.98 lean 448 0 Sample-448 Bacteroidetes Prevotella melaninogenica et rel.
113015 Prevotella melaninogenica et rel. Sample-360 902034 45 female CentralEurope o 13 5.49 severeobese 360 0 Sample-360 Bacteroidetes Prevotella melaninogenica et rel.
112747 Prevotella melaninogenica et rel. Sample-190 862870 34 female CentralEurope r 7 6.06 lean 190 0 Sample-190 Bacteroidetes Prevotella melaninogenica et rel.
113109 Prevotella melaninogenica et rel. Sample-743 852350 52 male US NA 19 5.21 obese 743 0 Sample-743 Bacteroidetes Prevotella melaninogenica et rel.
112944 Prevotella melaninogenica et rel. Sample-366 851147 52 female CentralEurope o 15 5.63 obese 366 0 Sample-366 Bacteroidetes Prevotella melaninogenica et rel.
113639 Prevotella melaninogenica et rel. Sample-375 844482 45 female CentralEurope o 16 5.64 severeobese 375 0 Sample-375 Bacteroidetes Prevotella melaninogenica et rel.

Sample operations

Sample names and variables

head(sample_names(pseq))
## [1] "Sample-1" "Sample-2" "Sample-3" "Sample-4" "Sample-5" "Sample-6"

Total OTU abundance in each sample

s <- sample_sums(pseq)

Abundance of a given species in each sample

head(abundances(pseq)["Akkermansia",])
## Sample-1 Sample-2 Sample-3 Sample-4 Sample-5 Sample-6 
##     1319     2299    29980     3824     2133      864

Select a subset by metadata fields:

pseq.subset <- subset_samples(pseq, nationality == "AFR")

Select a subset by providing sample names:

# Check sample names for African Females in this phyloseq object
s <- rownames(subset(meta(pseq), nationality == "AFR" & sex == "Female"))

# Pick the phyloseq subset with these sample names
pseq.subset2 <- prune_samples(s, pseq)

Pick samples at the baseline time points only:

pseq0 <- baseline(pseq)

Data transformations

The microbiome package provides a wrapper for standard sample/OTU transforms. For arbitrary transforms, use the transform_sample_counts function in the phyloseq package. Log10 transform is log(1+x) if the data contains zeroes. Also “Z”, “clr”, “hellinger”, and “shift” are available as common transformations. Relative abundances (note that the input data needs to be in absolute scale, not logarithmic!):

pseq.compositional <- microbiome::transform(pseq, "compositional")

Variable operations

Sample variable names

sample_variables(pseq)
##  [1] "age"                   "gender"               
##  [3] "nationality"           "DNA_extraction_method"
##  [5] "project"               "diversity"            
##  [7] "bmi_group"             "subject"              
##  [9] "time"                  "sample"

Pick values for a given variable

head(get_variable(pseq, sample_variables(pseq)[1]))
## [1] 28 24 52 22 25 42

Assign new fields to metadata

# Calculate diversity for samples
div <- global(pseq, index = "shannon")

# Assign the estimated diversity to sample metadata
sample_data(pseq)$diversity <- div

Taxa operations

Number of taxa

n <- ntaxa(pseq)

Most abundant taxa

topx <- top_taxa(pseq, n = 10)

Names

ranks <- rank_names(pseq)  # Taxonomic levels
taxa  <- taxa(pseq)        # Taxa names at the analysed level

Subset taxa

pseq.bac <- subset_taxa(pseq, Phylum == "Bacteroidetes")

Prune (select) taxa:

# List of Genera in the Bacteroideted Phylum
taxa <- map_levels(NULL, "Phylum", "Genus", pseq)$Bacteroidetes

# With given taxon names
ex2 <- prune_taxa(taxa, pseq)

# Taxa with positive sum across samples
ex3 <- prune_taxa(taxa_sums(pseq) > 0, pseq)

Filter by user-specified function values (here variance):

f <- filter_taxa(pseq, function(x) var(x) > 1e-05, TRUE)

List unique phylum-level groups:

head(get_taxa_unique(pseq, "Phylum"))
## [1] "Actinobacteria"         "Bacilli"               
## [3] "Proteobacteria"         "Verrucomicrobia"       
## [5] "Bacteroidetes"          "Clostridium cluster XV"

Pick the taxa abundances for a given sample:

samplename <- sample_names(pseq)[[1]]

# Pick abundances for a particular taxon
tax.abundances <- abundances(pseq)[, samplename]

Merging operations

Aggregate taxa to higher taxonomic levels. This is particularly useful if the phylogenetic tree is missing. When it is available, see merge_samples, merge_taxa and tax_glom).

Put the less abundant taxa together in the “Other” category:

pseq2 <- aggregate_taxa(pseq, "Phylum", top = 5) 

Merge the desired taxa into “Other” category. Here, we merge all Bacteroides groups into a single group named Bacteroides.

pseq2 <- merge_taxa2(pseq, pattern = "^Bacteroides", name = "Bacteroides") 

Merging phyloseq objects with the phyloseq package:

merge_phyloseq(pseqA, pseqB)

Rarification

pseq.rarified <- rarefy_even_depth(pseq)

Taxonomy

Convert between taxonomic levels (here from Genus (Akkermansia) to Phylum (Verrucomicrobia):

m <- map_levels("Akkermansia", "Genus", "Phylum", tax_table(pseq))
print(m)
## [1] "Verrucomicrobia"

Metadata

Visualize frequencies of given factor (sex) levels within the indicated groups (group):

res <- plot_frequencies(sample_data(pseq), "bmi_group", "gender")
print(res$plot)

# Retrieving the actual data values:
kable(head(res$data), digits = 2)
Groups Factor n pct
underweight female 21 91.30
underweight male 2 8.70
lean female 304 61.66
lean male 189 38.34
overweight female 102 50.00
overweight male 102 50.00

Custom functions are provided to cut age or BMI information into discrete classes.

group_bmi(c(22, 28, 31), "standard")
## [1] lean       overweight obese     
## Levels: underweight lean overweight obese severe morbid super
group_age(c(17, 41, 102), "decades")
## [1] [10,20)   [40,50)   [100,110]
## 10 Levels: [10,20) [20,30) [30,40) [40,50) [50,60) [60,70) ... [100,110]

Add metadata to a phyloseq object. For reproducibility, we just use the existing metadata in this example, but this can be replaced by another data.frame (samples x fields).

# Example data
data(dietswap)
pseq <- dietswap

# Pick the existing metadata from a phyloseq object
# (or retrieve this from another source)
df <- meta(pseq)

# Merge the metadata back in the phyloseq object
pseq2 <- merge_phyloseq(pseq, sample_data(df))

Leo Lahti, Sudarshan Shetty et al. 2018-05-21

>>>>>>> 02a10c7a130e3188cf9241e929bfe1698d1db69a